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Abstract 

We consider a quadrature-based eigensolver to find eigenpairs of 
Hermitian matrices arising in lattice quantum chromodynamics. To 
reduce the computational cost for finding eigenpairs of such Hermitian 
matrices, we propose a new technique for solving shifted linear systems 
with complex shifts by means of the shifted CG method. Furthermore 
using integration paths along horizontal lines corresponding to the real 
axis of the complex plane, the number of iterations for the shifted CG 
method is also reduced. Some numerical experiments illustrate the 
accuracy and efficiency of the proposed method by comparison with a 
conventional method. 

1 Introduction 

Eigenproblems arise in many scientific applications and in some cases, only 
a limited set of eigenpairs is needed. For example, to calculate all-to-all 
propagators in lattice quantum chromodynamics (QCD) [1], it is known that 
the contribution of some low-lying eigenvalues of a large sparse Hermitian 
matrix called "Hermitian fermion matrix" is dominant. 

For such eigenproblems, the Implicitly Restarted Lanczos method or 
generally, the Implicitly Restarted Arnoldi method (IRAM) [2] is one of the 
conventional choices. On the other hand, to find eigenvalues in a given re- 
gion and corresponding eigenvectors with contour integrations, the Sakurai- 
Sugiura (SS) method [3] has been proposed. The SB method translates a 
problem of finding eigenvalues in a domain surrounded by an integration 
path into a problem of solving systems of linear equations for some matrices 
with shifts corresponding to quadrature points of the contour integration. 



Therefore solving shifted hnear systems efficiently plays an important role 
of high performance of the SS method. In this paper, we improve the SS 
method by reducing computational cost for solving shifted linear systems. 

To solve such shifted linear systems efficiently, there are some ways. One 
is solving each linear system in parallel since all shifted linear systems arising 
in the SS method can be solved independently. This high parallelism is one 
of the important feature of the SS method. Another way is reducing matrix- 
vector multiplications in a Krylov subspace linear solver. We consider the 
latter in this paper. 

To that end, we propose following two ideas: first we adopt a Krylov 
subspace method for shifted linear systems [1] which needs matrix-vector 
multiplications only for one seed linear system to solve all shifted linear 
systems because of shift invariance of Krylov subspace. Moreover we show 
that the shifted CG method, which needs no more than one matrix-vector 
multiplication in each iteration, can be applied to solve such shifted linear 
systems if a coefficient matrix of a seed linear system is Hermite although 
coefficient matrices of the other shifted systems are non-Hermite. We show 
the detail in Section 2. Second, in Section 3, we consider appropriate con- 
figurations of quadrature points for less iterations for a Krylov subspace 
solver because the number of iterations depends on a shift parameter cor- 
responding to each quadrature point. Section 4 shows some numerical test 
to investigate the properties of the proposed method and its efficiency by 
comparison with PARPACK [5], the software package to solve eigenvalue 
problems with IRAM in parallel, with a Hermitian fermion matrix of lattice 
QCD and Section 5 concludes. 

2 The SS method with a Krylov subspace method 
for shifted Unear systems 

2.1 The SS method 

We consider an eigenvalue problem Ax = Xx where A £ ([^^x"- a Hermitian 
matrix and (A, x) is an eigenpair of A. Let F be a positively oriented closed 
Jordan curve in the complex plane and introduce a contour integration 



where / is an n x n unit matrix and v € C" is any nonzero vector. Accord- 
ing to the residue theorem, has only the contribution corresponding to 
eigenvalues inside T. 



In the moment based method [3J, a moment fik = I'^fe is defined and let 
the Hankel matrix G C'"^™ and the shifted Hankel matrix if< G C"*^"* 
be 
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respectively, where m is the number of eigenvalues inside T. Here let S = 
[sq,--- ,Sm-i] G C"^™", eigenvalues of the pencil {H^,Hm) are given by 
Ai,--- , Am and an eigenvector corresponding to A; is given by xi = Sui 
where ui is an eigenvector of {H^,Hm)- 

On the other hand, in a Rayleigh-Ritz type approach [6], constructing an 
orthonormal basis Q G C"^™ via the orthogonalization of S, approximate 
eigenvalues are given by the Ritz values of a projected matrix pencil {A, B) 
where A = Q^AQ G C^^™ and B = Q^Q G C"^™, respectively, and 
corresponding eigenvectors are given by xi = Qwi where wi is an eigenvector 
of {A,B). 

The Rayleigh-Ritz projection method is rather accurate than the moment- 
based method, however there is trade-off between accuracy and memory 
consumption. 

To calculate ([1]) numerically, the A'^-point trapezoidal rule is applied and 
we approximate by 

N-l 
j=0 

where Zj and wj are a quadrature point and a weight, respectively, and 
(j = [zj — 7)//0 is a normalized quadrature point satisfying the condition 
— 1 < Re Q < 1 with a shift parameter 7 G C and a scale parameter p > 0. 
In the case of an integration on a circle C with a center 7 and a radius p, a 
quadrature point and a weight are defined by 

zj=-f + p e^(^'+i/2)^ j = 0, 1, • • • , TV - 1, 

and 

^j = ^^^ j = 0,l,--- ,iV-l, 

respectively. Here let rji = {Xi — ^)/p and suppose v = J^i^n^h we can 
rewrite ^ ss Sk = Yli fkivi)'^i^i/ P by means of a filter function defined by 

In the case of a unit circle, it has been shown that fk{x) = + x^) 

[7] and it suppresses as 0{\x\~'^~^^) outside the circle. This means that 
Sfc has nonnegligible contribution corresponding to the eigenvalues outside 
the circle due to the approximation of the contour integration. When we 
construct 5 = [sq, • ■ ■ , sm-i], M should be more than m. 

Note that a block version of the SS method is proposed in [7], i.e. S is 
extended to C"x(A^x^) with L different arbitrary nonzero vectors vi, - ■ ■ ,vl. 
Using this method, we can obtain L degenerate eigenvalues and it is known 
that the accuracy is higher than just increasing M. 
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2.2 The SS method with the shifted CG method 



To calculate ([2]), shifted linear systems such that 

[zjI-A)y^=v (4) 

should be solved for each quadrature point Zj. In what follows, we propose 
some ideas to reduce the computational cost to solving with the shifted 
CG method [1]. 

It is known that there is a shift invariance of Krylov subspace /C^ (j4, ro) = 
span(ro, Atq, • • • , A^~'^rQ) with any shift o" G C such that 

Kk{A,ra)=lCk{A + aI,ro). (5) 

In a Krylov subspace linear solver such as the CG method, matrix-vector 
multiplications should be performed to update a residual vector G /Cfc+i (^j ^o)- 
Because of ([5]), a residual vector G lCk+i{A + a/, ro) corresponding to a 
matrix A + al can be given by 

with some scalar namely once a residual vector of a seed linear system 
rk is calculated with matrix-vector multiplications, residual vectors of any 
other shifted linear systems r^, what is more, corresponding solution vec- 
tors are given without additional matrix- vector multiplications. Suppose 
the computational cost of matrix- vector multiplications is dominant, the 
computational cost of the SS method is drastically reduced by 

In addition, we show that only one matrix-vector multiplication in each 
iteration is needed to solve shifted linear systems when a coefficient matrix 
consists of a Hermitian matrix with any shift. Consider the BiCG method 
to solve a system of linear equations with a coefficient matrix al — A. The 
BiCG method needs two matrix-vector multiplications in each iteration to 
update a residual vector £ ICk+i{crI — A,ro) and its shadow residual 
vector rl e ICk+i{{crI - A)^,rQ). When A is Hermite, we find that 

{al - A)^ = {al -A) + {a- a)I. 

Because of the shift invariance ([5]), is calculated by without matrix- 
vector multiplications and accordingly, the computational cost is reduced by 
half, if Tq = tq. Applying this technique to the shifted BiCG method, we can 
solve many shifted linear systems with only one matrix- vector multiplication 
in each iteration. If a shift a for the seed system is real, {al — Af^ = al — A, 
i.e. the seed system and its shadow system are coincident. In this case, we 
can apply the CG method to solve the seed system. This means that any 
shifted linear systems with arbitrary complex shift can be solved with the 
shifted CG method when a coefficient matrix corresponding to a seed system 
consists of a Hermitian matrix with some real shift. For the above reason, 
we adopt the shifted CG method in this paper. 
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Figure 1: The distribution of eigenvalues for the Hermitian fermion matrix 
which we use in Section 4 and the contour of the number of iterations for 
the shifted CG method for Zjl — A with tolerance for the relative residual 
||rfe||2/||t^||2 < 10-12. 



3 Integrations along straight lines 

Empirically, the number of iterations for the shifted CG method depends 
on distribution of eigenvalues near zero. In terms of shifted matrix zjI — A, 
the number of iterations for the shifted CG method tends to increase if the 
number of eigenvalues of A close to Zj becomes larger. Fig. [1] shows the 
distribution of eigenvalues for the Hermitian fermion matrix which we use 
in Section 4 and the contour of the number of iterations for the shifted CG 
method to solve a shifted matrix zjI — A with tolerance for the relative 
residual ||T"fc||2/||i'||2 ^ lO""'^^. Actually, the number of iterations for the 
shifted CG method gets larger as zj becomes closer to the real axis and 
its absolute value of real part becomes larger, which is consistent with the 
expectation from the distribution of eigenvalues as mentioned above. Then 
to reduce the number of iterations for the shifted CG method, quadrature 
points should be as far from the real axis as possible as accuracy of eigenpairs 
is enough for applications. 

In order to control the number of iterations for the shifted CG method 
and accuracy of eigenpairs, we introduce a new integration path as follows: 
let be two horizontal lines such that 

: z = -f + p{x± -1 < x < 1, 

where 7 is real. Then N/2 equally-spaced quadrature points zq, Z2-, - ■ ■ , zisj^2 
and 2;i,Z3,--- ,zn-i are located on L"*" and L~, respectively. Using this 
integration path, distance between a quadrature point and the real axis 
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Figure 2: Filter functions fo{x) for = 32 quadrature points on L with 
/3 = 0.2,0.6,1.0,1.4. For comparison, /o(x) in the case of C is also shown 
as a solid line. 



depends only on /3p unlike the case of C. In this set of weights 

for a integration {wq^wi, ■ ■ ■ ,W]\f-i} is defined by solving following linear 
equations, 



Fig. [2] shows filter functions /o (x) defined by ([3]) for = 32 quadrature 
points on with /3 = 0.2,0.6,1.0,1.4. For comparison, fo{x) in the case 
of C is also shown. In the case of C, foix) has a plateau in [-1,1] and 
exponentially suppresses in the other region. However in the case of L^, 
there is no plateau and the slope of dumping parts depends on f3 which 
controls the size of the gap between and L^. Therefore it is expected 
that the accuracy of eigenvalues is the highest right in the middle of and 
gets lower at the edges, so we accept the eigenvalues only in the rectangular 
formed by A^' quadrature points near the center 7 where A^' > 4 is an even 
number. 

Note that when we introduce more than two integration paths lying next 
to each other L^, L^, • • • which have the same A^, p and f3, quadrature points 
can be reused by just shifting them from one integration path to another 
IL'fe+i by A^' — 2 points. This is advantage of using the integration path L^. 

4 Numerical experiments 

A Hermitian fermion matrix is defined as A = 75(1 — kD) where k is a 
hopping parameter and -D is a complex non-symmetric sparse matrix ex- 
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Table 1: The efficiency and the accuracy of the SS method with C and L- 
for calculating 6 low- lying eigenvalues and corresponding eigenvectors. 



path 


C 


L± 


13 




0.2 




0.6 




1.0 




# matvec 


4352 




4268 




3560 




2886 




time [sec] 


399.1 




424.8 




355.5 




293.2 




resi 


1.8x10" 


11 


5.2x10" 


10 


3.1x10" 


-8 


1.5x10" 
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res2 


2.8x10" 


10 


1.8x10" 


12 


1.2x10" 


-9 


3.0x10" 
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res3 


4.0x10" 


10 


1.9x10" 


12 


1.6x10" 


-9 


3.3x10" 
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res4 


2.9x10" 


10 


3.0x10" 


12 


7.1x10" 


10 


1.1x10" 
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res5 


5.6x10" 


11 


2.5x10" 


12 


4.0x10" 


10 


6.7x10" 
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res6 


7.8x10" 


12 


6.6x10" 


10 


1.6x10" 


-8 


5.8x10" 


7 



plained in [8]. 75 is one of the Dirac 7 matrices. We employ the lattice size 
12^^ X 24 which corresponds to a 497, 664 dimensional matrix with 25, 380, 864 
nonzero components. Our choice of k = 0.13600 is rather close to the critical 
value Kc = 0.136116(8). 

Our experiments are carried out on a single node of T2K-TSUKUBA 
which has totally 648 nodes providing 95.4 Tflops of computing capability. 
Each node has 4 sets of a 2.3 GHz Quad-Core AMD Opteron Model 8356 
processor and a 8 GBytes DDR2-667 memory. 

Let us first compare the efficiency of the SS method between the conven- 
tional integration path C and the new integration path by calculating 
low-lying 6 eigenvalues and corresponding eigenvectors. In both cases, we 
set 7 = 0.0004, p = 0.0102, = 32 and M = 24. The gap size between L"^ 
and L~ is varied as /3 = 0.2, 0.6, 1.0. We employ the shifted CG method with 
the stopping criterion for the relative residual ||i"fc||2/||i^||2 < 10""*^^ choosing 
a shift (7 = for the seed. Eigenpairs are obtained with the Rayleigh-Ritz 
projection method. 

We show the efficiency and the accuracy of the SS method with C and 

in Table [U where res/ = \\Axi — XiXi\\2, \\xi\\2 = 1 is the residual for 
the l-th. smallest eigenvalue A; . In the case of , we obtain the lower accu- 
racy of eigenpairs toward the edges of the integration interval as expected in 
Sec. 3. We also find that accuracy of eigenpairs is increased at the cost of the 
number of matrix-vector multiplications as /? becomes smaller. This allows 
us to choose an optimal value of (3 to minimize the computational cost for 
the required precision of eigenpairs. We observe similar efficiency between 
the C and cases. An intriguing finding is that the elapsed time for 
with /3 = 0.2 is larger than that of C even if the number of matrix-vector 
multiplications of the former is less than that of the latter. The reason 
is that the vector operations in the shifted CG method require nonnegligi- 
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Table 2: The efficiency and the accuracy of the SS method with C and 
and PARPACK for calculating 20 low-lying eigenvalues and corresponding 
eigenvectors. 





SS (C) 


SS (L±) 


PARPACK 


# matvec 


7542 


6680 


7384 


Total # quad, points 


128 


102 




Total time [sec] 


1159.4 


1020.7 


1277.4 


Time for matvec [sec] 


421.2 


365.5 


406.1 


reSjnax 


7.7x10-1° 


2.6x10-1° 


2.8x10-12 


reSjnin 


1.4xl0"i2 


7.7x10-13 


3.6x10-15 



ble computational cost compared to the matrix-vector multiplication which 
contains only 51 nonzero components in each row in our case. 

We also compare the efficiency between PARPACK and the SS method 
with C and by calculating 20 low-lying eigenvalues and correspond- 
ing eigenvectors. The parameters of the SS method are chosen to sat- 
isfy the tolerance res; < 10-^ and the stopping criterion for PARPACK 
tol = 10-1°. For the SS method with C we employ 4 circles with N = 
32, M = 24 choosing 7 = -0.02485,-0.00964,0.01181,0.02575 and p = 
0.00435, 0.00960, 0.00860, 0.00431, each of which contains 5 eigenvalues. The 
SS method with uses 6 pair of hues with iV = 32, M = 24, N' = 16, 
p = 0.02121 and /3 = 0.2. The other setup for the SS method is the same as 
the previous experiment. For PARPACK the number of the Arnoldi vectors 
is chosen to be four times the number of eigenvalues, i.e. 80, in the regular 
mode. 

Table [2] shows the efficiency and the accuracy of three methods. reSmax 
and the maximum and minimum value of res;, respectively. There 

are two important points. One is that the SS method shows similar or better 
efficiency and accuracy in comparison with PARPACK thanks to the shifted 
CG method which reduces the number of matrix-vector multiplications by 
about 1/100. Another is that the SS method with needs less numbers 
of matrix- vector multiplications and quardrature points compared to the C 
case. This is because the case requires less number of iterations for the 
shifted CG method and allows us to reuse the quadrature points. 

5 Conclusions 

We introduce a shifted Krylov subspace method to reduce the computational 
cost for the SS method. Moreover we propose a new integration path along 
straight lines which decreases both the number of iterations for a Krylov 
subspace solver and the number of quadrature points. 
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We calculate some low-lying eigenvalues and corresponding eigenvectors 
of a Hermitian fermion matrix with the SS method and PARPACK. We show 
that the SS method becomes efficient comparable with PARPACK thanks 
to the shifted CG method and our new integration path is more efficient 
than the conventional one on a circle. 

Investigating more efficient integration paths and quadrature rules to 
reduce computational cost for the SS method is our future plan. 
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